Genome Modeling System: A Knowledge Management Platform for Genomics

نویسندگان

  • Malachi Griffith
  • Obi L. Griffith
  • Scott M. Smith
  • Avinash Ramu
  • Matthew B. Callaway
  • Anthony M. Brummett
  • Michael J. Kiwala
  • Adam C. Coffman
  • Allison A. Regier
  • Benjamin J. Oberkfell
  • Gabriel E. Sanderson
  • Thomas P. Mooney
  • Nathaniel G. Nutter
  • Edward A. Belter
  • Feiyu Du
  • Robert L. Long
  • Travis E. Abbott
  • Ian T. Ferguson
  • David L. Morton
  • Mark M. Burnett
  • James V. Weible
  • Joshua B. Peck
  • Adam Dukes
  • Joshua F. McMichael
  • Justin T. Lolofie
  • Brian R. Derickson
  • Jasreet Hundal
  • Zachary L. Skidmore
  • Benjamin J. Ainscough
  • Nathan D. Dees
  • William S. Schierding
  • Cyriac Kandoth
  • Kyung H. Kim
  • Charles Lu
  • Christopher C. Harris
  • Nicole Maher
  • Christopher A. Maher
  • Vincent J. Magrini
  • Benjamin S. Abbott
  • Ken Chen
  • Eric Clark
  • Indraniel Das
  • Xian Fan
  • Amy E. Hawkins
  • Todd G. Hepler
  • Todd Wylie
  • Shawn M. Leonard
  • William E. Schroeder
  • Xiaoqi Shi
  • Lynn K. Carmichael
  • Matthew R. Weil
  • Richard W. Wohlstadter
  • Gary Stiehr
  • Michael D. McLellan
  • Craig S. Pohl
  • Christopher A. Miller
  • Daniel C. Koboldt
  • Jason R. Walker
  • James M. Eldred
  • David E. Larson
  • David J. Dooling
  • Li Ding
  • Elaine R. Mardis
  • Richard K. Wilson
چکیده

In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sputnik: a database platform for comparative plant genomics

Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST col...

متن کامل

Sharable DBMS for Genome Informatics

The primary aim of this project is: To produce and disseminate a freely sharable, domain-speciic database management system (DBMS) suitable for use as a component of a genome informatics system. Over the past three-and-a-half years we have developed and operated genome informatics systems for the genetic-and physical-mapping projects carried out at the Whitehead In-stitute/MIT Center for Genome...

متن کامل

Data Management for High-Throughput Genomics

Today's sequencing technology allows sequencing an individual genome within a few weeks for a fraction of the costs of the original Human Genome project. Genomics labs are faced with dozens of TB of data per week that have to be automatically processed and made available to scientists for further analysis. This paper explores the potential and the limitations of using relational database system...

متن کامل

Platform-based product design and development: A knowledge-intensive support approach

This paper presents a knowledge-intensive support paradigm for platform-based product family design and development. The fundamental issues underlying the product family design and development, including product platform and product family modeling, product family generation and evolution, and product family evaluation for customization, are discussed. A module-based integrated design scheme is...

متن کامل

ژنومیکس انگل ها

Genes carry instructions to make protein that affect body's cells and their physical activity. They also play an important role in the occurrence of various characteristics in the body. Recently, scientists in the new field of science known as genomics have studied the genetic instructions. Genomics deals with the discovery of all the sequences in the entire genome of organisms and is used to s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2015